Simulation Study of Imbalanced Classification on High-Dimensional Gene Expression Data

نویسندگان

چکیده

Purpose: Classification of gene expression helps study disease. However, it faces two obstacles: an imbalanced class and a high dimension. The motivation this is to examine the effectiveness undersampling before feature selection on high-dimensional data with classes.Methods: Least Absolute Shrinkage Selection Operator (Lasso), which can select features, handle modeling. Random (RUS) be used deal classes. Decision Tree (CART) algorithm construct classification model because produce interpretable model. Thirty simulated datasets varying imbalance ratios are test proposed approaches, Lasso-CART RUS-Lasso-CART. generated from parameters real data.Results: simulation results show that when minority accounts for more than 25% observation size, method appropriate. Meanwhile, RUS-Lasso-CART effective size at least 20 observations.Novelty: novelty using hybrid address problem

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification of High Dimensional and Imbalanced Hyperspectral Imagery Data

The present paper addresses the problem of the classification of hyperspectral images with multiple imbalanced classes and very high dimensionality. Class imbalance is handled by resampling the data set, whereas PCA is applied to reduce the number of spectral bands. This is a preliminary study that pursues to investigate the benefits of using together these two techniques, and also to evaluate ...

متن کامل

On Mining Fuzzy Classification Rules for Imbalanced Data

Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...

متن کامل

On Mining Fuzzy Classification Rules for Imbalanced Data

Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...

متن کامل

Class-imbalanced classifiers for high-dimensional data

A class-imbalanced classifier is a decision rule to predict the class membership of new samples from an available data set where the class sizes differ considerably. When the class sizes are very different, most standard classification algorithms may favor the larger (majority) class resulting in poor accuracy in the minority class prediction. A class-imbalanced classifier typically modifies a ...

متن کامل

SVM Classification for High-dimensional Imbalanced Data based on SNR and Under-sampling

Support vector machine (SVM) is biased towards the majority class, in some case dataset is class-imbalanced and the bias is even larger for high-dimensional. In order to improve the classification accuracy of SVM on high-dimensional imbalanced data, we combine signal-noise ratio (SNR) and under-sampling technique based on K-means. In this article firstly we apply SNR into feature selection to r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Scientific Journal of Informatics

سال: 2023

ISSN: ['2407-7658', '2460-0040']

DOI: https://doi.org/10.15294/sji.v10i1.40589